Fast Average-Case Pattern Matching on Weighted Sequences

نویسندگان

Carl Barton

Chang Liu

Solon P. Pissis

چکیده

A weighted string over an alphabet of size σ is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain sequences, naturally arise in many contexts. In this article, we study the problem of weighted string matching with a special focus on average-case analysis. Given a weighted pattern string x of length m, a text string y of length n > m, and a cumulative weight threshold 1/z, defined as the minimal probability of occurrence of factors in a weighted string, we present an algorithm requiring average-case search time o(n) for pattern matching for weight ratio z m < min{ 1 log z , logσ log z(logm+log log σ) }. For a pattern string x of length m, a weighted text string y of length n > m, and a cumulative weight threshold 1/z, we present an algorithm requiring average-case search time o(σn) for the same weight ratio. The importance of these results lies on the fact that these algorithms work in average-case sublinear search time in the size of the text, and in linear preprocessing time and space in the size of the pattern, for these ratios.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-45: Advance MRI Sequences in Pelvic Endometriosis

Background: To assess MRI in diagnosing endometriotic lesions, emphasizing T2*weighted imaging efficacy. Materials and Methods: This prospective study of 48 females (22-38 years, average 29.6) clinically suspected of endometriosis from September 2009 to April 2012. MRI was performed with a 1.5 T imager (Siemens) with a body array coil. T1, T2 and T2* weighted (2D-FLASH) sequences were obtained ...

متن کامل

Two simple heuristics for the pattern matching on weighted sequences

Weighted sequences are used as profiles for protein families, in the representation of binding sites, and sequences produced by a DNA shotgun sequencing assembly. In this paper we present two simple heuristics for the pattern matching on weighted sequences. One is a simple heuristic which enables a faster validation between a weighted candidate and a weighted text. The other is applying the bad...

متن کامل

A Fast Generic Sequence Matching Algorithm

A string matching—andmore generally, sequence matching—algorithm is presented that has a linear worst-case computing time bound, a low worst-case bound on the number of comparisons (2n), and sublinear average-case behavior that is better than that of the fastest versions of the Boyer-Moore algorithm. The algorithm retains its efficiency advantages in a wide variety of sequence matching problems...

متن کامل

Pattern Matching on Weighted Sequences

Weighted sequences are used extensively as profiles for protein families, in the representation of binding sites and often for the representation of sequences produced by a shotgun sequencing strategy. We present various fundamental pattern matching problems on weighted sequences and their respective algorithms. In addition, we define two matching probabilistic measures and we give algorithms f...

متن کامل

On the Average-case Complexity of Pattern Matching with Wildcards

In this paper we present a number of fast average-case algorithms for pattern matching with wildcards. We consider the problems where wildcards are restricted to either the pattern or the text, however, the results can be easily adapted to the case where wildcards are allowed in both. We analyse the algorithms average-case complexity and their expected-case complexity and show new lower bounds ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1512.01085 شماره

صفحات -

تاریخ انتشار 2015

Fast Average-Case Pattern Matching on Weighted Sequences

نویسندگان

چکیده

منابع مشابه

I-45: Advance MRI Sequences in Pelvic Endometriosis

Two simple heuristics for the pattern matching on weighted sequences

A Fast Generic Sequence Matching Algorithm

Pattern Matching on Weighted Sequences

On the Average-case Complexity of Pattern Matching with Wildcards

عنوان ژورنال:

اشتراک گذاری